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Abstract — In  this  paper,  some  detection-theoretic,  estima¬ 
tion-theoretic,  and  information-theoretic  methods  are  investigated 
to  analyze  the  problem  of  determining  resolution  limits  in  imaging 
systems.  The  canonical  problem  of  interest  is  formulated  based  on 
a  model  of  the  blurred  image  of  two  closely  spaced  point  sources 
of  unknown  brightness.  To  quantify  a  measure  of  resolution  in 
statistical  terms,  the  following  question  is  addressed:  “What  is  the 
minimum  detectable  separation  between  two  point  sources  at  a 
given  signal-to-noise  ratio  (SNR),  and  for  prespecifled  probabil¬ 
ities  of  detection  and  false  alarm  (P^  and  P/)?”.  Furthermore, 
asymptotic  performance  analysis  for  the  estimation  of  the  un¬ 
known  parameters  is  carried  out  using  the  Cramer-Rao  bound. 
Although  similar  approaches  to  this  problem  (for  one-dimensional 
(1-D)  and  oversampled  signals)  have  been  presented  in  the  past, 
the  analyzes  presented  in  this  paper  are  carried  out  for  the  general 
two-dimensional  (2-D)  model  and  general  sampling  scheme.  In 
particular  the  case  of  under-Nyquist  (aliased)  images  is  studied. 
Furthermore,  the  Kullback-Liebler  distance  is  derived  to  further 
confirm  the  earlier  results  and  to  establish  a  link  between  the 
detection-theoretic  approach  and  Fisher  information.  To  study 
the  effects  of  variation  in  point  spread  function  (PSF)  and  model 
mismatch,  a  perturbation  analysis  of  the  detection  problem  is 
presented  as  well. 

Index  Terms — Aliasing,  Cramer-Rao  bound,  detection,  estima¬ 
tion,  Fisher  information,  imaging,  information-theoretic  imaging, 
Kullback-Liebler  distance,  model  mismatch,  perturbation  anal¬ 
ysis,  resolution,  variational  analysis. 


I.  Introduction 

The  problem  of  resolution  historically  has  been  of  signifi¬ 
cant  interest  in  different  communities  in  science  and  engi¬ 
neering,  for  example  in  astronomy,  optics,  different  applications 
in  physics,  array  processing  and  imaging.  In  this  paper,  we  focus 
on  the  problem  of  achievable  resolution  in  imaging  practice 
for  the  following  reasons.  First,  the  problem  of  resolving  point 
sources  can  be  considered  as  a  canonical  case  study  to  investi¬ 
gate  the  performance  of  imaging  systems  and  image  restoration 
techniques.  Second,  developing  different  techniques  to  resolve 
point  sources  is  indeed  a  concern  in  real-world  application,  e.g., 
astronomy.  Our  approach  to  this  problem  in  this  paper  is  based 
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Fig.  1.  Location  of  point  sources. 

on  statistical  detection  and  estimation  theory,  since  we  assume 
the  measured  signal  from  which  the  sources  are  to  be  resolved, 
is  noise-corrupted. 

To  begin,  let  the  original  image  of  interest  consist  of  two  im¬ 
pulse  functions  positioned  at  points  {px,Py)  and  {qx^Qy)  with 
a  small  distance  of  d=  s/iPx  —  qx)^  +  iPy  —  qyy  apart  (Fig.  1). 
That  is,  the  signal  model  is 

I{x,y)  =  s/a6(x-px,y-py)  +  s/pS{x  +  qx,y  +  Qy).  (1) 

We  consider  the  following  two-dimensional  (2-D)  model  for  the 
measured  (discrete)  signal 

g(xk,  yi)  =  s(xk,  yi)  +  w{xk,  yi) 

=  ah{xk  -Px,yi  -Py) 

+  Ph{xk  +  qx-,yi  +  qy) 

+  w{x]„yi),  G  {1,2,...,A}  (2) 

where  h{x,  y)  is  the  blurring  kemeF  (representing  the  overall 
point  spread  function  (PSF)  of  the  imaging  system)  and 
w{xk,  yi)  is  assumed  to  be  a  zero-mean  Gaussian  white  noise^ 
process  with  variance 

According  to  the  classical  Rayleigh  criterion,  two  point 
sources  are  “barely  resolvable”  when  the  central  peak  of  the 
pattern  generated  by  one  point  source  falls  exactly  on  the  first 
zero  of  the  pattern  generated  by  the  second  one  [6].  See  Fig.  2 
for  a  depiction.  But  in  fact  under  certain  conditions  related  to 
signal-to-noise  ratio  (SNR),  resolution  beyond  the  Rayleigh 
limit  is  indeed  possible.  This  can  be  called  the  “super-resolu¬ 
tion”  limit. 

The  statistical  analysis  presented  in  this  paper  is  formulated 
based  on  the  ability  to  distinguish  whether  the  measured  image 
is  generated  by  one  point  source  or  two  point  sources.  Specifi¬ 
cally,  for  the  proposed  model  the  equivalent  question  is  whether 
the  parameter  d  is  equal  to  zero  or  not.  If  d  =  0,  then  we  only 
have  one  peak  and  if  d  >  1  then  there  are  two  well  resolved 

'For  convenience  and  ease  of  presentation  only,  we  assume  that 
the  point  spread  function  is  a  symmetric  function  throughout,  i.e., 
h{x,y)  =  h(-x,y)  =  h(x, -y)  =  h(-x,-y). 

^We  do  not  concern  ourselves  in  this  paper  with  the  case  of  photon-limited 
systems. 
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Fig.  2.  Resolvability  in  the  Rayleigh  sense. 

peaks  according  to  the  Rayleigh  criterion.  3  So  the  problem  of 
interest  revolves  around  values  of  d  in  the  range  of  0  <  ri  <  1. 
This  can  he  posed  as  a  hypothesis  testing  problem,  i.e., 

Tio  :  d  =  0  One  point  source 
Til  :  d>  0  Two  point  sources 

or  equivalently  (see  (4)  at  the  bottom  of  the  page).  The  problem 
of  resolution  from  the  statistical  viewpoint  has  been  well  studied 
in  the  past  [25],  [2],  [1],  [7],  [9].  The  majority  of  researchers 
have  used  the  Cramer-Rao  bound  to  analyze  the  problem  of 
resolution  to  study  the  mean-square  error  of  unbiased  estimators 
for  the  distance  between  the  sources  [8],  [9],  [1],  [21],  [22], 
[13],  [16].  In  addition,  some  papers  [8],  [2]  have  considered 
a  hypothesis  testing  approach  to  determine  the  resolvability  of 
sources  in  some  limited  cases  for  one-dimensional  signals. 

As  for  novel  contributions  in  this  paper,  we  formulate  the 
problem  of  interest  as  a  composite  hypothesis  test  and  derive  an 
explicit  relationship  between  the  minimum  detectable  separa¬ 
tion  and  the  required  SNR  for  any  sampling  strategy  including 
aliased  (sub-Nyquist)  images.  As  a  result,  a  quantitative  mea¬ 
sure  of  resolution  is  obtained  which  can  be  used  to  understand 
the  effect  of  model  parameters  (e.g.,  PSF  and  sampling  param¬ 
eters).  The  following  question  is  addressed:  what  is  the  min¬ 
imum  separation  between  two  point  sources  (maximum  attain¬ 
able  resolution)  that  is  detectable  with  high  probability  at  a  given 
SNR.  Besides,  general  analytical  results  are  derived  for  arbitrary 
sampling  schemes.  Our  previous  work  has  been  focused  on  the 
case  of  one-dimensional  (1-D)  signals  and  over-Nyquist  sam¬ 
pling  [14],  [20],  [19].  In  this  paper,  we  present  extensions  of 
these  results  to  2-D  and  to  the  under-sampled  (aliased)  case  (for 
a  general  PSF). 

We  put  forward  a  solution  to  (4)  for  its  most  general  case 
where  all  the  parameters  involved  in  the  signal  model  are  un¬ 
known  to  the  detector.  Furthermore,  we  consider  two  cases, 
where  the  value  of  noise  variance  (rr^)  is  known  and  where  it 
is  unknown.  For  both  cases  we  develop  corresponding  detec¬ 
tion  strategies  and  obtain  the  explicit  relationship  between  SNR 
and  the  parameters  in  the  model.  Also,  to  further  analyze  the 
problem  of  resolution,  such  frameworks  will  be  explained  in  the 
following. 

^Throughout  this  paper,  we  assume  without  loss  of  generality  that  d  =  1 
corresponds  to  the  Rayleigh  limit. 
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A  significant  question  which  has  not  been  addressed  in  the 
past  is  to  analyze  the  effect  of  a  known  or  unknown  perturbation 
of  the  PSF  on  the  detection  performance.  Variation  in  PSF  can  be 
caused  by  other  blurring  elements  in  the  system,  for  example  an 
out-of-focus  lens  or  atmospheric  or  underwater  turbulence.  We 
first  address  the  problem  of  finding  the  change  in  the  required 
SNR  due  to  a  variation  of  the  PSF  required  for  resolvability. 
This  will  help  us  to  analyze  sensitivity  to  model  inaccuracies. 

In  the  interest  of  completeness,  we  also  compute  the  Fisher 
Information  matrix  (and  Cramer-Rao  (CR)  bound)  in  closed 
form  for  two  different  cases.  The  Cramer-Rao  lower  bound  for¬ 
mulation  is  used  to  study  the  limits  to  attainable  precision  of 
estimated  distances  of  the  two  point  sources.  We  carry  out  this 
analysis  for  the  case  of  under-sampled  images  and  for  a  general 
PSF. 

Another  appealing  and  informative  analysis  is  to  compute 
the  symmetric  Kullback-Liebler  Distance  (KLD)  or  Divergence 
[12,  p.  26].  KLD  is  a  measure  of  discriminating  power  between 
two  hypotheses,  and  is  directly  related  to  the  performance  figure 
of  a  related  optimal  detector.  To  accurately  compute  the  KLD 
for  the  underlying  problem,  we  make  some  essential  extensions 
to  the  conventional  formula  of  approximating  the  KLD.  We  shall 
see  an  interesting  and  important  connection  between  the  KLD 
analysis  and  the  detection-theoretic  analysis. 

As  for  the  application  of  the  presented  analysis  in  this  paper, 
we  shall  see  that  the  framework  of  addressing  the  fundamental 
relationship  in  resolving  point  sources  (i.e.,  the  tradeoff  between 
SNR  and  resolvability)  will  also  suggest  a  corresponding  de¬ 
tector  which  can  be  well  used  in  practice.  In  other  words,  the 
established  performance  expression  is  simply  the  result  of  ap¬ 
plying  the  proposed  local  detector. 

The  organization  of  the  paper  is  as  follows.  In  Section  11, 
we  will  present  and  formulate  our  detection-theoretic  approach. 
The  asymptotic  performance  of  maximum  likelihood  estimate 
of  the  unknown  parameters  in  terms  of  the  Fisher  Information 
Matrix  will  be  discussed  in  Section  III.  In  Section  IV,  we  will 
study  an  information-theoretic  criterion  for  the  problem  of  in¬ 
terest.  We  will  explain  our  perturbation  and  variational  analysis 
for  the  underlying  detection  problem  and  discuss  the  effect  of 
model  mismatches  in  Section  V.  We  will  also  discuss  some  of 
the  practical  applications  and  possible  extensions  of  the  pro¬ 
posed  framework  in  Section  VI.  Finally,  some  comments  and 
conclusion  will  be  presented  in  Section  VII. 

IF  Detection-Theoretic  Approach 

In  the  test  (4),  when  the  model  parameters  are  unknown, 
the  probability  density  function  (pdf)  under  both  hypotheses 
is  therefore  not  known  exactly,  resulting  in  a  composite  hy¬ 
pothesis  test.  It  is  therefore  not  possible  to  form  the  standard 
likelihood  ratio  test.  A  common  approach  in  this  case  is  the 
generalized  likelihood  ratio  test  (GLRT).  GLRT  first  computes 
maximum  likelihood  (ML)  estimates  of  the  unknown  parame¬ 
ters,  and  then  will  use  the  estimated  values  to  form  the  standard 
Neyman-Pearson  (NP)  detector.  While  other  useful  (Bayesian) 


"^0  :  gixk,  yi)  =  (a  +  P)hixk,  yi)  +  w{xk,  yi) 

Hi  :  g{xk-,yi)  =  ah{xk  -Px,yi  -Py)  +  Ph{xk  +  qx,yi  +  Qy)  +  w{xk,yi)- 


(4) 
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methods  are  also  available  for  such  composite  problems,  our 
focus  will  be  on  GLRT-type  methods  because  of  less  restrictive 
assumptions  and  easier  computation  and  implementation;  but 
most  importantly,  because  locally  most  powerful  (LMP)  tests 
can  be  developed  for  the  range  of  small  d  (below  the  Rayleigh 
limit).  Also,  as  will  be  shown  later,  the  performance  of  such  a 
detector  is  very  close  to  that  of  an  ideal  detector,  to  which  the 
values  of  all  the  parameters  in  the  model  are  known.  Hence, 
the  performance  of  the  suggested  detector  can  be  reasonably 
considered  as  an  approximate  performance  bound  in  practice. 

Applying  the  GLRT  approach  to  the  problem  of  interest  di¬ 
rectly  will  produce  a  highly  nonlinear  test  statistic  (see  [20]). 
However,  since  the  range  of  interest  for  the  value  of  d  is  as¬ 
sumed  to  be  small  (below  the  Rayleigh  limit  d  <  1),  we  can 
benefit  from  approximating  the  model  of  the  signal  for  nearby 
point  sources.  The  approximate  model  is  obtained  by  expanding 
the  signal  in  a  Taylor  series  about  the  small  parameter  values 
around  (p^,  QxjPyj  Qy)  =  (0, 0, 0, 0).  By  introducing  the  partial 
derivatives  of  h{x,y)  as 


hiji^Xk,  yi) 


d^~^^h{x,  y) 
dx^dyi 


x=xk,y=yi 


(5) 


we  write  the  signal  model  in  the  following  (lexicographically 
scanned)  vector  form  (e.g.,  [h]/Ar+fc  =  h(xk,  yi)) 


s  =  H0 


(6) 


where 


H=[h,hio,  hoi,  h205  ho2,  hii] 


e  = 


Q.  -\-  fd 

-apx  +  PQx 
-apy  +  (dqy 

i  (apl  +  pql) 

I  (apl  +  pql) 

-  apxPy  +  fdqxqy  - 


(7) 

(8) 


We  elect  to  keep  terms  up  to  order  2  of  the  above  Taylor  expan¬ 
sion  which  gives  a  more  accurate  representation  of  the  signal 
and  avoids  trivial  approximations  in  cases  where  the  first-order 
terms  would  simply  vanish  [19].  This  approximation  leads  to 
a  linear  model  detection  problem  and  also  is  equivalent  to  the 
framework  of  locally  most  powerful  tests  [11,  p.  218].  We  first 
consider  the  case  where  the  noise  variance  is  known  to  the  de¬ 
tector.  By  substituting  the  above  approximated  model  into  (4), 
the  hypothesis  test  can  be  rewritten  as 


r  Tfo  :  Ae  =  0 

\  Tfi  :  Ae  7^  0 


(9) 


where 


A  = 


■0  1 

0  0 

0  0 

0  0 

.0  0 


0  0  0  0- 

10  0  0 
0  10  0 

0  0  10 

0  0  0  1. 


(10) 


The  test  (9)  is  a  problem  of  detecting  a  deterministic  signal  with 
unknown  parameters.  The  GLRT  for  the  approximated  model 
yields  [11,  p.  274]: 


T(g)  =  ^0^A^[A(H^H)-1A^]-1A0  (11) 


where  g  is  the  (lexicographically  scanned)  vector  form  of  the 
measured  signal  ([g]/Ar-tfc  =  g{xk,yi))  and 

B  =  (H^H)-iH^g  (12) 

is  the  unconstrained  maximum  likelihood  estimate  of  6.  For  any 
given  data  set  g,  we  decide  Tfi  if  the  statistic  exceeds  a  specified 
fhreshold 

T(g)  >  7-  (13) 


The  choice  of  7  is  mofivated  by  the  level  of  tolerable  false  alarm 
(or  false -positive)  in  a  given  problem,  but  is  typically  kept  very 
low.  It  is  worth  mentioning  that  since  the  hypothesis  test  in  (3)  is 
a  one-sided  test,  the  above  formulations  (the  Taylor  approxima¬ 
tion  and  the  generalized  likelihood  ratio  setup  for  the  problem 
in  (9))  can  be  viewed  as  a  locally  most  powerful  detector  [1 1,  p. 
218].  From  (11),  the  performance  of  this  detector  is  character¬ 
ized  by 


Pf  =  Qxlil)  (14) 

Pd  =  (15) 

A  =  4^^A^[A(H^H)-1A^]-1A0  (16) 

where  Q^2  is  the  right  tail  probability  for  a  Central  Chi-Squared 
pdf  with  h  degrees  of  freedom,  and  Q  is  the  right  tail 
probability  for  a  noncentral  Chi-Squared  pdf  with  5  degrees  of 
freedom  and  noncentrality  parameter  A.  For  a  specific  desired 
Pfi  and  Pf,  we  can  compute  the  implied  value  for  the  noncen¬ 
trality  parameter  from  (14)  and  (15).  We  call  this  value  of  the 
noncentrality  parameter  A(P/,Pd).  This  notation  is  key  in  il¬ 
luminating  a  very  useful  relationship  between  the  SNR  and  the 
smallest  separation  which  can  be  detected  with  high  probability, 
and  low  false  alarm  rate.  From  (16)  we  can  write 

1x2  =  — 0^A^[A(H^H)-iA^]-iA0.  (17) 

A{Pf,Pd) 

Also,  by  defining  fhe  SNR  as 

SNR  =  (i8) 


and  replacing  the  value  of  with  the  right-hand  side  of  (17), 
the  relation  between  the  parameter  set  6  and  the  required  SNR 
can  be  made  explicit:^ 


'^To  give  an  insight  into  the  terms  in  the  expression  for  required  SNR,  let  us 
denote 


H^H 


’  a  ' 
b  C 


(19) 


where  a  =  h^h  (20) 

b  =  [h^hio,  h^hoi,  h^h^o,  h^ho2,  h^hii]^  (21) 

C  =  AH^HA^.  (22) 


Then  we  can  show 


[A(H^H)-iA^]-i  =  C  -  ^  bb^.  (23) 

a 

This  form  gives  a  rather  better  intuition  to  the  established  relationship  between 
the  required  SNR  and  the  resolvability.  For  instance  it  can  be  readily  proved 
that  the  appearance  of  the  subtracted  term  in  the  denominator  of  (24)  (  -bb^) 
is  due  to  the  energy  of  point  sources  (a  -|-  /?)  being  unknown  to  the  detector.  In 
other  words,  if  only  the  value  of  a  -|-  /?  is  known  to  the  detector,  this  term  would 
vanish  and  the  related  term  in  the  denominator  would  be  given  by  0^A^CA0. 
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TABLE  I 


Case  where  ...  Effect  of  Pf  and  Pj,  is  given  by 

all  the  parameters  are  known  (33)  and  (34) 

all  the  parameters  except  the  noise  variance  are  unknown  (14)-(16)  and  (24) 

all  the  parameters  including  the  noise  variance  are  unknown  (29)-(31)  and  (32) 


SNR  =  X{Pf,  Pd) 


(24) 


The  above  analysis  can  be  well  extended  to  the  case  where  is 
unknown  a  priori.  The  corresponding  hypotheses  for  this  case 
are  given  by 


[Hq:  A0  =  0,  >  0 

\  Tfi  :  A0  7^  0,  >  0 


The  GLRT  for  (25)  [1 1,  p.  345]  gives  the  following  test  statistic: 


^  g^A^[A(H^H)-iA^]-iAg 


(26) 


where  subscript  “u”  denotes  the  case  of  unknown  noise  variance, 
I  is  the  identity  matrix,  and  0  is  the  same  unconstrained  max¬ 
imum  likelihood  estimation  of  0  as  in  (12).  For  any  given  data 
set  g,  we  decide  Tii  if  the  statistic  exceeds  a  specified  threshold 


T.(g)  >  y. 


(27) 


From  (11),  the  performance  of  this  detector  is  characterized  by 
[11,  p.  186] 

p/  =  QE.,._yy)  (28) 

7'd  =  (3E'_^_,(A„)(y)  (29) 

=  ^0^A^[A(H^H)-iA^]-iA0  (30) 

where  is  the  right  tail  probability  for  a  Central  F  dis¬ 

tribution  with  five  numerator  degrees  of  freedom  and  N  —  6  de¬ 
nominator  degrees  of  freedom;  and  Qp^  ^  g(A„)  is  ihe  right  tail 
probability  for  a  noncentral  F  distribution  with  five  numerator 
degrees  of  freedom  and  N  —  6  denominator  degrees  of  freedom, 
and  noncentrality  parameter  A„. 

In  this  GLRT  context,  the  following  relation  between  the  pa¬ 
rameter  set  0  and  the  required  SNR  (denoted  by  a  subscript  “u”) 
can  be  obtained 


SNR„ 


^u{Pf,Pd) 


0^A^[A(H^H)-iA^]-iA0 


(31) 


which  mirrors  (24),  with  the  only  difference  in  performance 
being  the  change  of  coefficient  from  X(Pf,Pd)  to  Xu{Pf,  Pd)- 
It  can  be  easily  verified  that  X(Pf,Pd)  <  Xu(Pf,Pd)  for  Pd  > 

Pf- 

In  either  case,  an  important  question  is  to  consider  how  dif¬ 
ferent  these  obtained  performance  is  from  that  of  the  “ideal” 
clairvoyant  detector,  to  which  all  the  parameters  (i.e.,  0  and 
are  known.  We  first  note  that  in  this  case  the  hypothesis  test  in 
(9)  will  be  a  standard  linear  Gauss-Gauss  detection  problem. 
Also,  we  can  further  simplify  the  problem  by  seeing  that  the 
term  (a  -|-  /3)h  in  the  signal  model  in  (6)  is  a  common  known 
term  under  both  hypotheses  and  can  be  removed.  As  a  result. 


the  following  relationship  can  be  obtained  for  the  completely 
known  case: 


sNRid  =  yp/,py 


0^n^m 

Fa^ah^ha^ 


(32) 


where  the  subscript  “id”  denotes  the  ideal  case  and  r](Pf,Pd)  is 
the  required  deflection  coefficient  [11,  p.  71]  computed  as 


r,  =  (Q-\Pf)-Q-\Pd))^  (33) 

where  Q~^{-)  is  the  inverse  of  the  right-tail  probability  func¬ 
tion  for  a  standard  Gaussian  random  variable  (zero  mean  and 
unit  variance).  Comparing  the  expression  in  (24)  to  that  of  (33) 

SNR  _X{Pf,Pd)  0^ A0 
SNRid  “  viPf,Pd)  0^A^[A(H^H)-iA^]-iAe 

where  we  note  that  r](Pf,Pd)  <  X{Pf,  Pd),  provided  Pd>  Pf. 
Also,  it  can  be  proved  that  AH^HA^  —  [A(H^H)“^A^]“^ 
is  a  positive  definite  matrix.  As  a  result,  as  expected,  SNR^^  > 
SNR  >  SNRid,  always.  However,  as  we  will  demonstrate  in 
one  of  the  following  sections,  the  difference  between  SNR^^  and 
SNRid  is  quite  small  over  most  of  the  parameter  range.  Also,  in 
[19]  we  have  shown  that  in  the  limiting  cases  (where  intensities 
are  known  and  equal),  the  proposed  detector  indeed  produces 
a  uniformly  optimal  test  (same  as  the  ideal  detector).  Further¬ 
more,  for  the  general  case  as  we  will  show  later,  the  difference 
between  performances  of  the  proposed  detector  and  ideal  de¬ 
tector  is  very  small.  Therefore,  we  argue  that  the  GLRT  frame¬ 
work  used  to  suggest  a  detector  can  be  reasonably  accounted  to 
set  a  performance  limit  in  practice. 

For  clarity’s  sake,  we  summarize  the  dependence  of  perfor¬ 
mance  on  Pf  and  Pd  for  the  three  cases  discussed  above  in 
Table  I. 

A.  Ejfects  of  Sampling  Schemes  on  Performance 

The  expressions  in  (24)  or  (31)  are  in  general  applicable  to 
any  sampling  schemes  including  nonuniformly  sampled  or  un¬ 
dersampled  (aliased)  images.  However,  since  the  energy  of  a 
bandlimited  signal  in  the  continuous  domain  is  that  of  its  uni¬ 
formly  and  supercritically  discretized  version  divided  by  the 
sampling  rate,  the  right-hand  side  of  these  expressions  should 
be  understood  more  generally  as  depending  upon  the  sampling 
offsets  (phases)  of  the  discrete  images.  In  particular,  for  under¬ 
sampled  images,  the  energy  terms  in  will  vary  signifi¬ 

cantly  as  sampling  phase  changes.  We  will  study  this  effect  in 
Section  II-A2. 

An  interesting  related  question  is  how  the  availability  of  mul¬ 
tiple  observations  of  the  image  (i.e.,  several  frames  with  dif¬ 
ferent  sampling  phases)  affects  the  performance.  Let  Eli  denote 
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Fig.  3.  Normalized  1-D  cut  of  the  point  spread  functions  used  to  present  the  results  in  this  paper. 


the  /th  set  of  acquired  samples  (i.e.,  Zth  frame  in  a  video  se¬ 
quence)  out  of  a  total  of  L  frames  and  let  H;  and  SNR;  repre¬ 
sent  the  corresponding  H  and  the  required  SNR  of  the  Zth  image. 
Then,  the  overall  required  SNR  is  given  hy 


SNRi  =  A(Py,Pd) 


(35) 


where 


L 


1=1 


(36) 


Furthermore,  it  can  he  proved  that 

1  ^ 

SNRi<-5]SNR,  (37) 

^  i=i 

with  equality  sign  for  the  case  of  oversampled  frames.  The 
reason  behind  this  is  that  the  energy  of  signal  in  over-Nyquist 
case  is  a  constant  value  and  does  not  depend  on  the  phase  of 
sampling.  Hence,  each  (supercritically  sampled)  frame  has  the 
same  effect  on  the  detection  performance.  In  other  words,  in 
this  case  for  1  <  m,n  <  L. 

However  in  under-Nyquist  case,  some  frames  (due  to  better 
placement  of  samples)  provide  more  information  for  de¬ 
tectability  than  others. 

In  the  following  pages,  we  further  analyze  the  performance 
results  obtained  earlier  for  oversampled  images.  Next  we  look 
into  a  case  where  the  underlying  image  is  undersampled  to  see 
the  effect  of  aliasing  on  the  performance.  Having  earlier  derived 
the  general  expression  for  the  performance,  to  facilitate  the  pre¬ 
sentation,  in  what  follows,  we  will  often  consider  a  particular 
case  with  the  following  set  of  conditions  (we  may  call  this  the 
symmetric  case)  as  follows: 

•  Px  =  PyjQx  =  Qy  =  0-, 


•  a  =  /?  =  1; 

•  h{x,y)  =  h{^J +  y^)  =  h{r)  (angular  symmetric 
kernel). 

Some  examples  of  angular  symmetric  kernels  which  will  be 
used  later  in  this  paper  are  “jinc-squared”  and  Gaussian  win¬ 
dows.  The  former  (jinc^(r))  is  the  PSF  resulting  from  a  single 
circular  aperture  and  is  characterized  by 


jinc(r)  = 


p27r 

=  /  +  r  cos  9)\d9 

— 1  Jo 


(38) 


27rr\/^  , 

_  Ji(27rr) 

2'Kr 

where  Ji  ( • )  is  the  first  order  Bessel  function  of  the  first  kind  [6] . 
The  Gaussian  kernel,  on  the  other  hand,  can  be  considered  as  a 
typical  approximation  of  the  overall  effect  of  various  elements  in 
the  imaging  systems  (including  aperture,  charge-coupled  device 
(CCD),  out  of  focus  lens,  atmospheric  or  underwater  turbulence, 
etc.);  it  is  given  by 


h{r) 


(39) 


A  plot  of  the  above  kernels  is  shown  in  Fig.  3,  where  we  note  the 
corresponding  Rayleigh  spacings  for  both  functions  are  1  (this 
corresponds  to  ar  =  0.35  in  (39). 

1 )  Over-Nyquist  Sampling:  In  this  section,  we  further  sim¬ 
plify  the  earlier  results  for  the  over-Nyquist  case.  To  begin,  we 
note  that  the  energy  terms  can  be  written  as 


P  =  H^H 

r  Eo  0 

0  E’lo 

_  0  0 

—Eio  0 
—Eqi  0 
.  0  0 


0  — i?io  —Eq\ 

0  0  0 

£■01  0  0 

0  £20  £11 

0  £11  £02 

0  0  0 


0 

0 

0 

0 

0 

£11 


(40) 
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Pd=0.99  Pfa=10“2 


Fig.  4.  Minimum  detectable  d  as  a  function  of  SNR  (in  dB)  (just  above  the  Nyquist  rate);  Gaussian  PSF;  (31)  and  (32). 


where 


p  _ Vi^Vi 


=  I  /  u'^v^H{u,v)\‘^dudv 


r+oo  1^+00  r 


=  f! 


'  —oo  J  — oo 


9*+-^7i(a;,  y) 
dx^dy^ 


n  2 


dxdy. 


Q  =  A^A(H^H)-iA^]-iA 


ro  0 

0  E’lo 
0  0 
0  0 


0 

LO 


0 

0 

Em 

0 

0 

0 


0 

0 

0 

Tp2 


0 

0 

0 

Eii-^^ 

^02 -f 


0  1 
0 
0 
0 
0 

-I 

(43) 


^For  instance 


/■+”  /■+”,,  .dh(x,y) 

7  7 

«/— CO  j — CO 

/+C50  r+co  2 

-OO  j  —  CO  ^ 

dh^(x,  y) 


dxdj/ 


/ 


dx 
+  00  2 


-AxAy 


h^{x,y) 


dt/  =  0 


(42) 


since  the  PSF  has  finite  energy  and  lim^j^ico  h{x,  y)  =  0. 


J 


(41)  Fig.  5.  Sampling  phases  in  x  and  y  directions  (<j>  and  ip;  dots  indicate  the  lo¬ 
cations  at  which  signal  is  being  sampled). 


The  zero  elements  in  (40)  are  due  to  the  orthogonality  of  some  of 
the  partial  derivatives  with  each  other.5  With  the  above  notation, 
we  will  also  have 


Let  us  now  consider  the  symmetric  case.  For  the  oversampled 
case,  as  c?  —S’  0,  we  will  have 


SNR 


HPf,Pd)  64 


E;o 


where  we  can  show  that 

^+oo 

,2 

77 

/O 


Eq  — 

ElO  =  TT 


A2  d^  E0E20  -  Elo 

rh^{r)dr 


p  _ ^ 

^20  -  T 


r+oo 

r 

D 

j.+oo  3 

n  r 


dhjr) 

dr 


n  2 


dr 


'dhir)' 

2 

+  3r 

d‘^h(r) 

dr 

dr‘2 

(44) 

(45) 

(46) 

dr  (47) 


as  energy  terms.  See  Appendix  A  for  detailed  calculation  of 
these  energy  terms. ^  Fig.  4  shows  the  minimum  detectable  d 
versus  SNR  for  the  proposed  local  detector  in  (31)  and  the  ideal 

®We  note  that  due  to  the  Cauchy-Schwartz  inequality  Effy  <  E0E20,  the 
right-hand  side  of  (44)  is  always  positive.  Also,  as  discussed  in  [20],  this  term 
will  vanish  if  the  amplitude  of  the  original  scene  (a  +  /?)  is  known  to  the  de¬ 
tector.  This  is  to  say 


SNR  ; 


\{Pf,Pd)  64  Eq 


(48) 
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Po=0.99,  PpA=10-2,  jinc2  PSF 


SNR(dB) 

Fig.  6.  Minimum  detectable  d  as  a  function  of  SNR  (in  dB);  best,  worst,  and  average  performance  over  the  possible  range  of  sampling  phases  (one  set  of  uniform 
samples  50%  below  the  Nyquist  rate);  GLRT  detector;  h{r)  =  jinc^(r);  known  rr^;  (24). 

Pq=0.99,  Pp^=10"2,  Gaussian  PSF 


SNR(dB) 

Fig.  7.  Minimum  detectable  <7  as  a  function  of  SNR  (in  dB);  best,  worst,  and  average  performance  over  the  possible  range  of  sampling  phases  (one  set  of  uniform 
samples  50%  below  the  Nyquist  rate);  GLRT  detector;  Gaussian  kernel;  known  rr^ ;  (24). 

detector  for  the  case  of  Gaussian  PSF  (symmetric)  and  over-  Interestingly,  the  inverse  proportionality  of  the  fourth  root 
Nyquist  sampling.  The  former  detector  has  been  suggested  for  of  SNR  and  resolution  (separation  between  point  sources)  was 
the  case  where  all  the  model  parameters  including  the  noise  vari-  been  reported  earlier  by  our  paper  [19]  and  other  researchers  in 
ance  is  unknown  to  the  detector.  However  as  can  be  seen,  for  any  different  frameworks  in  array  processing  [21],  [22],  [13],  [16] 
given  d,  the  difference  between  the  required  SNR  for  these  de-  and  also  in  optical  imaging  applications  [8],  [9],  [1],  [19]  and 
lectors  is  at  most  3-4  dB.  references  therein. 
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Pq=0.99,  Pfa“'*0  jinc^  PSF,  Average  performance 


Fig.  8.  Minimum  detectable  d  as  a  function  of  SNR  (in  dB);  average  performance  over  the  possible  range  of  sampling  phases  (one  set  of  uniform  samples  at 
different  sampling  rates);  GLRT  detector;  jinc-squared  kernel;  known  rr^;  (24). 


Pp=0.99,  PpA^IO  jinc^  PSF,  Worst  Case  performance 


Fig.  9.  Minimum  detectable  d  as  a  function  of  SNR  (in  dB);  worst  case  performance  over  the  possible  range  of  sampling  phases  (one  set  of  uniform  samples  at 
different  sampling  rates);  GLRT  detector;  jinc-squared  kernel;  known  rr^;  (24). 
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2)  Under -Nyquist  Sampling:  Supercritical  sampling  of  a 
bandlimited  function  (e.g.,  h{x^  y)  and  its  derivatives)  preserves 
its  energy  by  a  factor  related  to  tbe  sampling  rate,  regardless  of 
sampling  offset.  However,  for  tbe  under-Nyquist  (aliased)  case 
every  element  of  H^H  will  be  a  function  of  sampling  phases 
in  X  and  y  directions  (we  call  these  sampling  phases  (p  and  i/;, 
see  Fig.  5).  This  leads  to  a  rather  complicated  expression  for 
the  required  SNR  versus  d.  Let  us  define  a  number  of  terms 
which  will  be  used  to  describe  the  dependence  of  sampling  pa¬ 
rameters  on  different  variables  and  matrices.  Suppose  H(u,v) 
is  band  limited  to  <  u  <  Bu  and  —B^  <  v  <  B^  in  the 
frequency  domain  and  define  and  Ly  = 

where  [( •  )1  denotes  the  smallest  integer  greater  than  or  equal 
to  its  argument.  To  clarify  further,  let  Pgub  be  the  product 
H^H  for  the  under-Nyquist  case.  In  general,  this  matrix  will 
have  the  following  form^: 


Lu 


Psub  -  p + 


m=0  n=0 


P™’”"  cos{2Trm(pi  +  2'Kn'ipi) 


1=0 

L 


-l-pm,n  sm_{2'Km(j)i  -\-  2'Kn'ipi) 


1=0 


(49) 


where  P  is  defined  in  (40)  (for  the  over-sampled  case)  and 
pm,n>s  and  P™’"^’s  are  the  matrices  resulting  from  the  aliased 
components. 

Figs.  6  and  7  show  the  minimum,  maximum  and  average 
values  of  the  required  SNR  over  the  possible  range  of  sampling 
phases  for  jinc-squared  and  Gaussian  kernels,  respectively. 
These  curves  are  generated  by  using  (24),  which  is  the  ex¬ 
pression  for  the  required  SNR  corresponding  to  the  proposed 
detector  in  (26).  For  a  given  value  of  d,  the  required  SNR  is 
computed  for  the  values  of  (p  and  'tp  in  the  range  of  [0,  27r]. 
Then  the  maximum,  minimum,  and  average  of  these  values  are 
computed. 

The  curves  are  shown  for  the  case  where  only  one  set  of  uni¬ 
formly  sampled  data  is  available.  Nevertheless,  it  can  be  proved 
that  the  maximum,  minimum,  and  average  values  of  the  resolv¬ 
ability  (or  required  SNR)  remain  the  same  for  arbitrary  number 
of  frames.  This  follows  by  noting  that  the  number  of  frames 
or  the  sampling  rate  are  embedded  inside  the  “SNR”  on  the 
left-hand  side  of  (24)  and  (31).  For  instance,  for  resolving  a  par¬ 
ticular  separation,  doubling  the  sampling  rate  does  not  change 
the  required  SNR,  but  rather  implies  that  the  same  detection  per¬ 
formance  can  be  achieved  with  twice  the  noise  variance  as  com¬ 
pared  to  the  original  sampling  rate. 

It  is  seen  that  the  required  SNR  for  the  case  where  the  PSF 
is  a  Gaussian  is  on  average  3  dB  (and  16  dB  at  “worst  case”) 
more  than  for  that  for  the  case  of  simple  circular  aperture.  This 
can  be  explained  by  noting  that  the  jinc-squared  kernel  con¬ 
tains  more  energy  in  its  second  derivative.  The  more  energy  in 
the  second  partial  derivative  means  bigger  difference  between 
the  PDF’s  under  the  two  hypotheses  and  therefore  better  de¬ 
tectability.  Also,  we  note  that  the  jinc-squared  window  has  a 
larger  effective  bandwidth  which  lets  more  high  frequency  in¬ 
formation  through.  This  phenomenon  will  also  be  observed  in 
the  following  sections. 

Finally,  Figs.  8  and  9  show  the  average  and  maximum  resolv¬ 
ability  at  different  sampling  rates  below  Nyquist.  As  observed. 


^See  Appendix  B. 


change  of  sampling  rate  has  much  less  effect  on  the  average 
performance  (i.e.,  the  required  total  SNR)  compared  to  that  of 
the  performance  at  worst  case  (Fig.  9).  These  worst  cases  occur 
when  at  some  sampling  scenarios  (e.g.,  at  50%  Nyquist),  the 
discrete  measured  signal  includes  two  strong  peaks  located  far 
from  each  other  (or  roughly  speaking,  where  the  acquired  sam¬ 
ples  are  far  from  the  point  sources).  This  phenomenon  clearly 
degrades  the  performance  of  the  detector.  On  the  other  hand, 
if  there  exist  some  samples  which  are  positioned  closely  to  the 
point  sources,  the  detector  collects  more  information  about  the 
underlying  signal. 


III.  Estimation-Theoretic  Approach,  Fisher  Information 


To  complement  the  earlier  results,  in  this  section,  we  carry  out 
the  Fisher  Information  derivations  for  the  general  signal  model. 
Many  papers  have  computed  the  Cramer-Rao  bound  to  study 
the  mean-square  error  of  unbiased  estimators  for  the  distance 
between  the  point  sources  [8],  [9],  [1],  [21],  [22].  The  result  of 
the  CRLB  analysis  assists  us  to  first  confirm  the  earlier  results 
obtained  by  the  detection-theoretic  approach,  to  better  under¬ 
stand  the  effect  of  estimation  accuracy  on  the  performance  of 
the  local  detector  developed  in  Section  II,  and  finally  to  derive 
the  KLD  in  Section  IV,  when  discussing  the  information-the¬ 
oretic  framework.  We  present  the  analysis  for  the  case  where 
Bu  <  fs  <  ‘^Bu  and  By  <  fg  <  2By  and  when  two  frames 
are  measured,  with  sampling  phases  of  (cpi^ipi)  and  (<?!>2,'02)> 
respectively.  As  such,  the  vector  of  unknown  parameters  of  the 
signal  model  in  (2)  is 

r  -1  T 


t  = 


Px^  Qx^Pyt  Qy}  Cl)  /?)  </’2)  4^2 

'■ - V - ^  ' - - - ' 


(50) 


L  t,-  ts  J 

in  which  we  identify  two  sets  of  parameters:  parameters  of  in¬ 
terest  (tr)  and  nuisance  parameters  (tg).  We  have  assumed  cpi 
and  'ipi  to  be  known,  since  the  first  frame  can  be  considered  as 
the  reference  frame. 

The  CRLB  for  the  separation  d  (i.e.,  a  nonlinear  function  of 
t)  is  given  by  (51)-(52)  shown  at  the  bottom  of  the  next  page 
[10]  where  A  is  the  Fisher  information  matrix  defined  by 


[A]ii  =  -E 


g^lnp(g,t) 

dtidtj 


—  ah{xk  -  Px,yi  -  Py) 


+Ph{xk  +  qx,yi  +  qy)-^ah{xk  -Px,yi  -  Py) 


+Ph{xk  +  qx,yi  +  qy) 


(53) 


where  p(  • )  denotes  the  probability  density  function  of  the  mea¬ 
sured  signal  defined  by  (2).  The  matrix  A  can  be  partitioned 
with  respect  to  the  parameter  sets  ty  and  tg  as 


A  = 


A^g 


A^^g 

Agg 


(54) 


The  derivation  of  the  Fisher  information  matrix  for  the  general 
sampling  scheme  is  presented  in  Appendix  C.  For  the  over- 
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Fig.  10.  iyCRLB((i)  versus  d;  Maximum,  minimum  and  average  values  over  the  possible  range  of  sampling  phases  resulted  from  two  sets  of  uniform  samples 
50%  below  the  Nyquist  rate  (Periodically  nonuniform  sampling);  a  =  /3  =  1;  Gaussian  PSF;  known 


Nyquist  case,  we  note  that  the  summations  in  Fisher  informa¬ 
tion  matrix  can  he  simply  substituted  hy  continuous  integra¬ 
tions.  Furthermore  these  integrations  can  he  rather  easily  com¬ 
puted  in  the  frequency  domain  for  a  given  point  spread  function 
(see  [20]  for  some  examples). 

As  for  the  under-Nyquist  case,  similar  to  the  earlier  calcula¬ 
tion  of  energy  terms  in  Section  II,  we  can  see  that  the  Fisher 
Information  Matrix  has  the  following  components 

Lu  div  L 

Asub  =  ^  +  XI  X  cos(27rn()()z  -f  2-Kmi;i) 

n=0  m=0  .  1=0 


L 

-|-Ag sm(2'Kn(f>i  +  2'Kmipi) 
/=o 


(55) 


where  A  is  the  Fisher  information  matrix  provided  there  is  no 
aliasing  (i.e.,  same  as  what  was  computed  for  the  over-Nyquist 
case)  and  A^’^^’s  and  A^’^^’s  are  the  related  matrices  due  to 
aliasing.  Similar  to  Section  II,  for  a  given  value  of  d,  the  square 


root  of  the  CRLB  is  computed  for  the  values  (p  and  i/;  in  the 
range  [0, 27r].  Then  the  maximum,  minimum  and  average  values 
are  calculated  to  he  shown  in  the  following  figures.  For  example, 
these  values  are  displayed  in  Fig.  10  for  a  Gaussian  kernel  when 
a  =  (3  =  1.  As  seen,  the  estimation  task  becomes  much  harder 
as  d  decreases. 

Figs.  1 1  and  12  show  the  average  of  the  square-root  of  CRLB 
versus  d  for  two  cases,  where  a  =  (3  and  where  a  ^  j3.  For  the 
case  where  the  PSF  is  jinc-squared,  the  estimation  accuracy  is 
better,  due  to  the  larger  energy  in  the  higher  order  derivatives  of 
a  jinc-squared  function.  The  effect  of  the  difference  between  in¬ 
tensities  (a  —  (3)  on  the  estimation  variance  is  shown  in  Figs.  13 
and  14.  These  curves  indicate  that  the  estimation  task  is  harder 
for  the  case  of  unequal  intensities,  as  expected. 

So  far  we  have  explored  some  detection-theoretic  and  esti¬ 
mation-theoretic  approaches  to  the  problem  of  achievable  res¬ 
olution.  We  have  investigated  the  corresponding  performance 
figures  (required  SNR  for  a  specific  resolvability  and  the  lower 
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Fig.  1 1 .  ^CRLB(d)  versus  d;  average  values  over  the  possible  range  of  sampling  phases  resulted  from  two  sets  of  uniform  samples  50%  below  the  Nyquist  rate 
(Periodically  nonuniform  sampling)  for  Gaussian  and  jinc-squared  PSFs;  a  =  /?  =  1;  known  a'^. 


Fig.  12.  ^CRLB((i)  versus  d;  average  values  over  the  possible  range  of  sampling  phases  resulted  from  two  sets  of  uniform  samples  50%  below  the  Nyquist  rate 
(Periodically  nonuniform  sampling)  for  Gaussian  and  jinc-squared  PSFs;  a  =  1.5, /?  =  0.5;  known 


bound  on  error  of  estimating  the  separation).  In  the  following 
section  we  use  (and  extend)  a  well-known  information-theoretic 
measure  in  distinguishing  two  hypotheses.  This  measure  nicely 
links  the  asymptotic  detection  performance  to  the  Fisher  infor¬ 
mation  derived  in  this  section. 


IV.  Information-Theoretic  Analysis, 
Kullback-Liebler  Distance 
In  the  interest  of  completeness  and  also  verifying  the  earlier 
result  from  yet  another  perspective,  we  investigate  the  problem 
of  the  achievable  resolution  by  an  information-theoretic 
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cl=0.03;  ©2=1 


Fig.  13.  iyCRLB(d)  versus  a(=  2-/3);  average  values  over  the  possible  range  of  sampling  phases  resulted  from  two  sets  of  uniform  samples  50%  below  the 
Nyquist  rate  (Periodically  nonuniform  sampling)  for  Gaussian  and  jinc-squared  PSFs;  d  =  0.03 ;  known  . 


d=0.3;  ©2=1 


Fig.  14.  Y^CRLB(d)  versus  a(=  2-/3);  average  values  over  the  possible  range  of  sampling  phases  resulted  from  two  sets  of  uniform  samples  50%  below  the 
Nyquist  rate  (Periodically  nonuniform  sampling)  for  Gaussian  and  jinc-squared  PSFs;  d  =  0.03;  known 


approach.  Namely,  we  compute  the  symmetric  Kull- 
hack-Liehler  Distance  (KLD)  or  Divergence  (j7)  [12,  p. 
26]  for  the  underlying  hypothesis  testing  problem.  KLD  is 
a  measure  of  discrimination  between  two  hypotheses,  and 
can  be  directly  related  to  the  performance  of  the  optimal 
detector.  However,  since  KLD  analysis  does  not  take  the 
effect  of  nuisance  parameters^  into  account,  it  will  indicate 


a  somewhat  loose  bound  on  the  detection  performance  for 
our  problem. 

Quantitatively,  KLD  is  related  to  detection  performance  in  the 
following  way.  Asymptotically,  as  the  number  of  independent 
samples  (N)  grows  to  infinity  [12] 

^Parameters  which  are  unknown  to  the  detector  but  are  common  under  both 
hypotheses. 
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(56) 

To  gain  better  insight,  we  first  carry  out  an  analysis  for  the 
case  where  the  point  sources  are  symmetric  (that  is  to  say 
=  q^^Py  =  Qy  and  a  =  j3).  Having  computed  this  simpler 
case,  we  will  extend  the  result  to  the  general  case.  To  begin, 
let  p(g,  d)  and  p(g,0)  be  the  PDF’s  of  the  measured  signal 
under  hypotheses  T(o  and  Tii  in  (4).  Then,  we  will  have  (see 
Appendix  D) 

J (d)  =  d)  -  p(g,  0)]  log  dg  (57) 

~  ^2^2^  I  Qd2 

k  I  ^ 

=  ^h^2ho2  (58) 

as  d  — *■  0,  where  V  is  the  observation  (signal)  space.  We  note 
that  the  KLD  measure  behaves  as  the  minimum  detectable  d 
raised  to  the  power  of  4  (confirming  the  power  law  we  have 
derived  for  the  inverse  of  the  required  SNR  in  earlier  sections.). 

Now,  let  us  consider  the  more  general  model  of  unequal  and 
asymmetric  point  sources.  First,  we  observe  from  the  above 
analysis  that  for  the  underlying  problem,  KLD  computation 
requires  an  extension  to  higher  order  terms  (see  Appendix  D 
again).  To  this  end,  we  extend  the  (low  order)  formula  typically 
used,  as  in  for  example  [12,  p.  26] 


(6).  Interestingly,  the  above  framework  shows  that  computing 
KLD  in  the  context  of  [12]  (that  is,  an  approximation  based  on 
small  variations  of  parameter(s)  of  interest)  is  in  spirit  similar  to 
computing  the  original  KLD  for  the  approximated  model.  The 
latter  approach,  of  course,  does  not  require  any  concern  about 
higher  order  terms. 

For  an  alias-free  signal  (61)  can  be  further  simplified  to  (62) 
shown  at  the  bottom  of  the  page  whereas  for  the  aliased  case, 
the  right-hand  side  of  (61)  will  depend  on  the  sampling  phases. 

For  undersampled  image,  similar  to  earlier  analyzes,  KLD 
varies  with  sampling  phases 


Jsuhio)  =  j{e) 

+  EE 

7n=0  n=0 


L 

cos(2Trm(f>i  +  2Tm'tpi) 

1=0 


L 

-1- ^  sm(2Trm(f>i  +  27rni/;/) 
1=0 


(63) 


where  J^(0)  is  the  KLD  for  actual  over-sampled  image  and 
and  are  the  related  terms  caused  by 

aliasing.  As  sampling  rate  increases,  the  terms  resulting  from 
aliasing  -|-  n  >  0)  will  vanish  and  the  expression 

in  (62)  is  obtained.  Fig.  15  shows  variation  of  KLD  (63)  over 
the  range  of  sampling  phases  for  the  symmetric  case.  We  have 
used  the  expression  in  (57)  to  find  the  maximum,  minimum  and 
average  values  of  KLD  for  any  given  separation  d  below  the 
Rayleigh  limit. 


J(tr)  ^  (59) 


by  a  second-order  approximation 
i  ~  A  I  t- 

yXj'P j  =0 


+  [col  (M^)] 


d'^Ar 


dtl 


[col  (tr-t^)]  (60) 


tr=0 


where  col(  • )  denotes  the  lexicographical  (columnwise)  scan¬ 
ning  operator.  After  some  algebra  (60)  will  lead  to^ 


J(0)  «  -^0^A'^AH'^HA'^A0  (61) 

This  is  again  in  general  applicable  to  any  arbitrary  sampling 
scenario  and  point  spread  function.  It  is  worth  mentioning  that 
the  matrix  A^AH^HA^A  is  in  fact  the  Fisher  information 
for  the  parameter  set  0  in  the  quadratic  approximated  model  in 

®Note  the  difference  between  8  and  t,.  which  are  defined  in  (8)  and  (50), 
respectively. 


V.  Variations  in  Point-Spread  Function 

The  purpose  of  this  section  is  to  analyze  how  other  parameters 
in  real-world  imaging  can  affect  the  achievable  resolution.  The 
analysis  here  helps  us  to  compute  the  effect  of  small  unmodeled 
variations  in  PSF  or  changes  of  PSF  by  other  blurring  functions 
(e.g.,  the  effect  of  lens  or  charge  coupled  devices  (CCD)).  In  this 
section  we  are  interested  in  studying  three  cases.  In  the  first  case 
we  present  an  analysis  for  an  imaging  system  in  which  samples 
are  acquired  through  a  CCD.  We  study  how  the  detection  perfor¬ 
mance  is  affected  by  using  such  a  system  as  compared  to  the  ide¬ 
alized  point  sampling.  For  the  second  case,  we  study  the  effect 
of  variations  in  PSF  on  the  resolvability  in  a  general  framework. 
We  assume  that  the  variation  in  PSF  is  known  to  the  detector  and 
we  derive  the  sensitivity  of  the  required  SNR  versus  the  small 
change  in  PSF.  The  third,  and  perhaps  most  important  case,  is 
the  scenario  where  the  model  (PSF)  based  on  which  we  design 
our  detector  is  slightly  different  from  the  actual  PSF.  The  result 
of  these  analyzes  will  (for  example)  help  quantify  the  impor¬ 
tance  of  precisely  knowing  the  blurring  kernel  on  the  resolving 
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Fig.  15.  Maximum,  minimum  and  average  values  of  KLD  versus  d  for  the  possible  range  of  sampling  phases  (50%  below  the  Nyquist  rate);  h{r)  =  jinc^(r); 
symmetric  case. 


power.  To  gain  more  intuition,  we  concentrate  on  alias-free  im¬ 
ages  throughout  this  section. 


A.  Imaging  With  Spatial  Integration:  CCD  Sampling 


In  real-world  imaging  there  are  other  possible  blurring 
sources  which  change  the  total  PSF  of  the  imaging  system. 
These  blurring  effects  are  usually  modeled  as  a  space-invariant 
functions  and  can  be  therefore  represented  by  a  linear  convolu¬ 
tion  with  the  assumed  PSF  of  theimaging  system.  For  instance, 
imaging  with  a  CCD  can  be  properly  modeled  in  such  a  way. 
To  see  this  effect,  let  us  first  recall  that  in  uniform  standard 
(point)  sampling  scheme,  we  have 


s{xk,yi)  =  s{x,y)\^=k/f„y=i/f,  (64) 

where  fg  is  the  sampling  frequency.  On  the  other  hand,  using  a 
CCD  in  image  gathering  will  result  in  spatial  integration  of  the 
light-field  coming  from  a  (continuous)  physical  scene.  As  an 
example,  CCDs  with  square  area  which  are  uniformly  sensitive 
to  light  will  generate  the  following  discretized  output: 


^ccd(^fc)  yz) 

^  M-l/2)/f,+W/2 

J (1-112)1  f,-Wl2 


Ak-ll2)lfs+Wl2 


l(k-l/2)lf,-Wl2 


s(x,  y)dxdy 


(65) 


Fig.  16.  Simple  structure  illustrating  the  spatial  integration  caused  by  CCD. 

where  W  is  the  dimension  of  each  CCD  cell,io  as  depicted  in 
Fig.  16.  By  defining 

rect(x,y)  =  I  1^1’  1^1  —  (66) 

[  0  otherwise 

as  the  blurring  kernel  of  the  above  CCD,  it  can  be  seen  that  the 
expression  in  (65)  directly  leads  to  the  following: 

^ccd(d'fc)  yi) 

=  s(x,y)  *  *rect(x,y)\^=k/fs,y=i/f,  (67) 

=  I (x,y)*  *h(x,  y)  *  *rect(a:,  (68) 

^ccd (x,y) 

where  I{x,y)  is  the  original  scene  to  be  imaged,  **  denotes 
the  two-dimensional  convolution  operator  and  /iccd('?  •)  is  the 

'•’We  ignore  the  effect  of  fill  factor  not  being  100%. 
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Fig.  17.  Relative  difference  between  the  required  SNR  for  CCD  sampling  and  that  of  point  sampling. 


overall  PSF.  Now  let  SNRccd  denote  the  required  SNR  for  the 
image  collected  hy  the  above  CCD  sampling  scheme.  Fig.  17 
shows  the  relative  difference  between  this  quantity  and  the  re¬ 
quired  SNR  for  the  point  sampling  as  a  function  of  W.  In  this 
example  we  have  considered  the  symmetric  and  over-Nyquist 
case  and  have  presented  the  results  for  both  jinc-squared  and 
Gaussian  PSFs. 

From  a  system  design  point  of  view,  since  typically  a  CCD 
with  larger  effective  area  has  better  noise  characteristics  (i.e., 
smaller  a^),  one  can  optimize  the  size  of  the  CCD  by  using  such 
a  curve  in  effecting  a  trade-off  with  other  parameters  (like  noise 
variance  versus  sampling  rate). 

In  what  follows,  we  investigate  the  effect  of  any  (small)  vari¬ 
ations  in  the  PSF  on  the  required  SNR  in  a  general  framework 
using  perturbation  analysis. 

B.  Variational  Derivative  Approach 

In  this  section  we  concentrate  on  calculating  the  sensitivity  of 
the  required  SNR  to  (known)  variations  in  PSF.  We  recall  from 
Section  II  that  the  expression  for  required  SNR  is  a  function  of 
the  PSF  and  in  particular  its  partial  derivatives  (up  to  the  second 
order).  We  use  well-known  techniques  in  calculus  of  variations 
to  compute  the  overall  variation  in  the  required  SNR. 

To  begin,  consider  the  expression  for  SNR  in  (24)  and  sup¬ 
pose  that  the  point  spread  function  is  changed  by  h{x,  y)  — *■ 


h{x,  y)  +  Ah{x,  y).  By  using  the  concept  of  variational  deriva¬ 
tive  [5],  [24],  [26],  we  can  compute  the  variation  in  SNR  (de¬ 
noted  by  ASNR(/i))  due  to  the  variation  Ah(x,  y).  Namely 


ASNR(/i) 

_  x{Pf,Pd)  e^APoe^QO  -  e^AQ00^P0 
~  ^V2  0^Q00^Q0 

_  XiPf,Pd)  0'^iAP00'^Q  -  AQ00^P)0 
~  ^V2  0^Q00^Q0 


where  ASNR(/i)  =  SNR(/i-h  A/i)  -SNR(/i)  and  AP  and  AQ 
are  perturbations  of  matrices  P  and  Q  defined  in  (40)  and  (43). 
We  have  detailed  the  derivation  of  AQ  and  AP  in  Appendix  E. 
Following  these  derivations,  we  present  the  result  for  the  case 
where  point  sources  are  located  symmetrically  =  Py^Qx  = 
qy  =  0,  a  =  f3  =  1,  and  h{x,y)  =  h{^yx‘^  +  y‘^)  =  h{r) 
(angular  symmetric  kernel).  In  this  case  the  required  SNR  is 
given  by  (44).  Then  the  variation  is  computed  as  the  equation 
shown  at  the  bottom  of  the  page.  As  a  result,  the  relative  change 
in  SNR  is  given  byii 


ASNR(/i) 

SNR 


f-l-oo 


Ah{x,y)Ah{x,y)dxdy 


(71) 


"See  Appendix  E  for  details. 


ASNR(/i) 


KPpPd) 


64  2£’oA£’o  (if’o-E’20  —  -E’lo)  ~  Eq{EqAE2q  +  A£’o-E’2o  —  2EiqAEiq) 
^  iEoE2o  -  E^of 
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Fig.  18.  1-D  cut  of  stretched  (e  =  0.4)  and  compressed  (e  =  —0.4)  versions  of  the  Gaussian  PSF. 


where 


Ah{x,y)  = 


EqE2q  —  e‘Iq 


(72) 

Invoking  the  Cauchy-Schwartz  inequality,  we  have  (73)  shown 
at  the  hottom  of  the  page  which  indicates  that  the  relative  change 
in  SNR  is  hounded  hy  the  energy  in  the  variation  times  a  term 
related  to  energies  of  the  PSF  and  its  derivatives.  As  an  ex¬ 
ample,  consider  the  following  variation  which  corresponds  to 
a  “stretching”  (e  <  0)  or  “compressing”  (e  >  0)  of  the  PSF 

A/i(a;,  y)  =  p{e)h{[l  +  e]x,  [1  -|-  e\y)  -  h{x,  y)  (74) 


is  merely  an  energy  normalization  factor,  A  plot  of  a  1-D  cut 
of  stretched  and  compressed  versions  of  the  Gaussian  PSF  is 
depicted  in  Fig.  18). 

Fig.  19  shows  the  normalized  variation  in  SNR  versus  e  for 
the  jinc-squared  kernel.  As  expected,  with  a  narrower  point 
spread  function  (e  >  0)  less  SNR  is  required  to  resolve  the  point 
sources  and  vice  versa.  In  fact,  we  can  obtain  a  closed-form 
relationship  for  ASNR  in  this  specific  case  assuming  that  the 
image  is  sampled  supercritically.  Let  us  consider  the  expression 
in  (44)  which  includes  the  energy  of  the  PSF  and  its  partial 
derivatives.  Now  let  Eij{e)  denote  the  energy  terms  for  the  new 
kernel  p{e)h{[l  +  e]a;,  [1  -|-  e]y).  It  can  he  shown  that 


where 


= 


J-^  J-^  hHx,y)My 


Eij(€)  =  (l  +  €y+^Eij(0)  (77) 

(75)  *^The  expression  in  (74)  can  be  approximated  by  (see  the  equation  at  the 
bottom  of  the  page.) 


ASNR(/i) 

SNR 


f-l-OO  p-l-oo 
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’  — OO  J  — OO 


'  — OO  J  — OO 
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ASNR(/i)  ^  \fE^^J EqE^q  —  EqE2q  —  4£’q£’io£’30  +  8£’o£’io-E’20  ~  ^^Ef^ 


^-1-00  1^+00 
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Fig.  19.  Variation  in  the  required  SNR  versus  parameter  variation  in  PSF. 


After  some  algebra  we  have 


ASNR 

SNR 


(78) 


which  holds  true  for  any  h{x,y).  The  curve  in  Fig.  19  simply 
shows  this  relationship. 


in  which  the  second  term  on  the  right-hand  side  is  identi¬ 
fied  as  the  “estimator  bias.”  We  are  now  able  to  show  that 
0  +  A6  follows  a  Gaussian  PDF  with  mean  9  -\-h  and  vari¬ 
ance  Using  this,  we  can  compute  the  PDF  of 

AB  -\-  AA6  which  is  again  a  Gaussian  pdf  with  mean  A6  -|-  b 
and  variance  tT^A(H^H)“^  A^.  The  pdf  of  the  suggested  test 
statistics  which  is  compared  to  a  prespecified  threshold  similar 
to  the  expression  in  (13) 


C.  Effects  of  Model  Mismatch  on  Performance 

Another  type  of  analysis  is  to  study  the  case  when  the  as¬ 
sumed  model  of  the  measured  signal  does  not  match  the  true 
model.  This  is  an  interesting  case  study  which  provides  answers 
to  the  question  of  how  much  performance  degrades  due  to  mod¬ 
eling  inaccuracies  or  mismatch. 

Let  us  assume  that  the  actual  point  spread  function  is 
h{x,y)  +  Ah{x,y),  whereas  the  optimal  detector  is  designed 
for  the  point  spread  function  h{x,y)',  so  that  Ah(x,y)  is  the 
mismatch  (unknown  to  the  detector)  between  the  actual  PSF 
and  the  assumed  PSF.  We  observe  that  the  (approximated) 
signal  model  in  this  case  is  now  given  by 

s  +  As  =  (H  +  AH)0  =  s  +  AH0  (79) 

where 

AH  =  [Ah,  Ahio,  Ahoi,  Ah2o,  Aho2,  Ahn]. 

The  measured  signal  will  then  be  g  -|-  AH0.  Consequently,  the 
estimate  of  the  parameter  vector  will  be  changed  to 

0  +  A0  =  (H^H)-iH^g  +  (H^H)-iH^AH0  (80) 

' - V - ' 

b 


+  A0)^A^[A(H^H)-iA^]-iA(0  +  A0)  >  j  (81) 

is  characterized  by  a  noncentral  Chi-squared  pdf  under  both  hy¬ 
potheses  [1 1,  p.  32].  To  this  end,  we  conclude  that  the  detection 
performance  in  the  presence  of  mismatch  is  characterized  by 

Pfih  +  Ah)  =  <3^(2  (7)  (82) 

Pdih  +  Ah)  =  (83) 

where 

Ai  =  ^b^A^[A(H^H)-iA^]-iAb  (84) 

(7^ 

A2  =  +  b)^A^[A(H^H)-iA^]-iA(0  +  b)  (85) 

are  the  resulting  noncentrality  parameters.  It  is  also  worth  em¬ 
phasizing  here  that  in  order  to  obtain  Ai  and  A2,  the  value  of 
in  the  expression  above  needs  to  be  computed  according  to  the 
desired  P^  and  Pf  by  using  the  formula  in  (16). 

Now  as  an  example  we  again  use  (74)  to  compute  the  varia¬ 
tion  in  probability  of  detection  and  false  alarm  rate.  Hereafter, 
we  present  the  results  for  the  case  where  the  desired  detection 
and  false  alarm  rates  are  0.99  and  0.01,  respectively.  Figs.  20 
and  21  show  the  variation  in  probability  of  error  versus  e  (which 
controls  the  stretching  or  compressing  the  PSF  as  in  (74)  for  two 
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Pe=0.01,jinc2  PSF 


Fig.  22.  Variation  in  the  error  rate  versus  d  for  fixed  e;  jinc-squared  PSF. 


Pg=0.01,  jinc2  PSF 


Fig.  23.  The  maximum  tolerable  e  versus  d  for  the  fixed  probability  of  error;  jinc-squared  PSF. 


Firstly  we  observe  that  the  detector  performance  is  severely  af¬ 
fected  for  a  smaller  d  (e.g.,  d  =  0.1).  Roughly  speaking,  for  the 
range  of  d  <  0.3,  the  proposed  detector  completely  fails  if  |e| 
exceeds  /2. 

In  Fig.  22  we  observe  how  the  probability  of  error  changes 
as  a  function  of  d  for  a  given  e  (i.e.,  variation  in  PSF).i3  Also 

l^Hereafter,  we  only  consider  negative  e. 


Fig.  23  depicts  the  variations  in  PSF  which  can  be  tolerated 
such  that  the  probability  of  error  is  lower  than  a  certain  level. 
Clearly,  this  amount  highly  depends  on  the  separation  between 
point  sources.  For  a  small  separation,  even  a  minimal  variation 
in  PSF  can  cause  dramatically  unpleasant  results.  It  is  worth 
noting  that  the  SNR  used  to  generate  these  figures  is  the  same 
as  that  required  for  Pe  =  0.01  under  no  model  mismatch. 
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maximum  s  which  can  be  compensated,  Pg=0.01,  jinc^  PSF 


Fig.  24.  Maximum  tolerable  e  which  can  be  compensated  by  sufficiently  increasing  SNR  versus  given  d;  jinc-squared  PSF. 


Another  interesting  question  in  this  regard  would  he  how 
much  extra  SNR  is  required  to  compensate  the  error  caused  hy 
a  model  mismatch.  To  answer  this  question  let  us  consider  the 
case  where  =  0.99  and  Pf  =  0.01  are  the  desired  detection 
and  false  alarm  rate,  respectively.  To  satisfy  these  conditions 
we  can  calculate  the  threshold  7  which  must  he  equal  to  15.1  in 
(13).  However,  if  there  exists  any  mismatch  caused  hy  the  vari¬ 
ation  in  PSF,  according  to  (84)  and  (85),  hy  some  calculations, 
we  have  Ai  >  33.5  and  A2  <  0.008  in  order  to  achieve  the  error 
rates  above.  In  other  words,  for  a  given  Ah(x,  y),  increasing  the 
SNR  can  provide  the  desired  detection  accuracy  only  if  the  fol¬ 
lowing  inequalities  hold  simultaneously: 

^b^A^[A(H^H)-iA^]-iAb  <  0.008  (89) 

+  b)^A^[A(H^H)-iA^]-iA(0  +  b)  >  33.5  (90) 

We  note  that  the  first  inequality  enforces  the  pdf  of  the  test 
statistic  under  TYq  to  have  smaller  noncentrality  parameter  so 
that  the  required  is  accessible,  whereas  the  second  inequality 
plays  the  reverse  role  for  the  pdf  under  Pi.  We  can  unify  the  in¬ 
equalities  in  (89)  and  (90)  as 

{6  +  b)^A^[A(H^H)-^A^]-^A(g  +  b) 

b^A^[A(H^H)-iA^]-iAb 

(91) 

Under  the  above  condition  a  sufficiently  high  value  of  SNR  can 
compensate  the  effect  of  model  mismatch.  To  be  more  realistic 
let  us  also  carry  out  the  analysis  for  the  case  where  we  allow 
the  probability  of  error  to  be  equal  to  0.02  (i.e.,  P^  =  0.98  and 
Pj  =  0.02).  The  threshold  7  remains  the  same.  However,  to 
satisfy  the  new  conditions  we  only  need  A  >  30.5  and  A  <  0.66. 


In  other  words  it  is  possible  to  have  an  error  rate  less  than  0.02 
if 

{6  +  b)^A^[A(H^H)-^A^]-^A(g  4-  b) 
b^A^[A(H^H)-iA^]-iAb 

(92) 

As  an  illustration  of  the  above  analysis,  let  us  consider  the  case 
where  PSF  undergoes  the  same  effect  as  in  (74).  Fig.  24  shows 
the  amount  of  mismatch  which  can  be  compensated  by  presum¬ 
ably  high  SNR  (theoretically  as  SNR  — *■  00)  for  a  given  dis¬ 
tance  d.  We  demonstrate  two  cases:  the  case  where  no  extra 
error  can  be  tolerated  (APe  =  0)  and  the  case  where  we  allow 
APe  =  0.01.  The  results  clearly  indicate  that  specially  for  small 
distance  between  point  sources  the  detector  is  extremely  sensi¬ 
tive  to  (unknown)  variation  of  PSF. 

VI.  Practical  Aspects  and  Extensions 

In  this  section,  we  briefly  discuss  some  future  extensions  and 
practical  aspects  of  the  analyzes  developed  in  the  paper. 

A.  Application  in  Astronomical  Imaging  and  Image 
Processing:  A  Primary  Case  Study 

A  complete  extension  of  the  current  work  to  a  real  problem  in 
astronomy  requires  a  careful  study  and  is  beyond  the  scope  of 
this  paper.  Typically,  the  question  of  resolvability  in  astronomy 
introduces  new  aspects  to  the  problem  such  as  presence  of  other 
interfering  phenomena,  objects,  clutter,  non-Gaussian  noise  and 
also  time-varying  PSF.  Each  of  these  issues  should  be  carefully 
treated  and  incorporated  in  the  analysis.  As  an  example,  we 
have  addressed  the  issue  of  variation  and  mismatches  in  PSE 
in  Section  V. 
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Fig.  25.  Original  image. 


Fig.  27.  Result  of  applying  CLEAN  method  to  the  image  in  Fig.  26. 


Fig.  26.  Measured  image.  Fig-  28.  The  result  of  modified  CLEAN  (CLEAN  associated  with  the  local 

detectors). 


In  this  section  we  present  a  basic  application  of  the  proposed 
detector  to  a  related  but  slightly  simplified  problem.  The  under¬ 
lying  image  to  be  restored  is  a  collection  of  point  sources  (or 
more  precisely  pairs  of  closely  spaced  point  sources)  and  the 
collected  data  is  a  discrete  noisy  and  blurred  version  of  it  as 
shown  respectively  in  Figs.  25  and  26. 

In  the  astronomical  data  analysis  community,  a  well-known 
deconvolution  method  named  CLEAN  is  widely  used  [23]. 
CLEAN  strictly  assumes  that  the  underlying  image  consists 
of  point  sources  and  estimates  their  locations  and  intensities. 
Eurthermore,  the  PSE  is  assumed  to  be  known.  It  first  finds  the 
point  which  has  the  maximum  brightness  (xq,  ya)  and  subtracts 
the  effect  of  the  (estimated)  point  sources  (i.e.,  the  convolution 
of  6{x  —  Xo,y  —  yo)  and  the  PSE).  Then  the  resulted  signal 
is  iteratively  used  to  find  and  remove  other  point  sources  until 
the  energy  of  the  residual  signal  reaches  a  prespecified  noise 
level.  The  result  of  applying  this  method  to  the  image  in  Eig.  25 
is  demonstrated  in  Eig.  27.  Obviously,  the  use  of  CLEAN  is 
limited  by  the  width  of  PSE,  since  any  pair  of  point  sources 


which  are  spaced  within  a  distance  related  to  the  width  of  the 
PSE  is  identified  as  a  single  point  source. 

We  argue  that  the  resolvability  of  CLEAN  can  be  significantly 
improved  by  employing  the  local  detector  we  have  proposed  in 
this  article.  We  include  our  local  detection  machinery  in  this  ap¬ 
proach  in  the  following  way.  Eirst  instead  of  searching  for  the 
point  with  the  maximum  brightness,  we  find  the  point  with  the 
largest  intensity  after  convolving  the  data  with  PSE.  This  ap¬ 
proach  will  help  us  to  locate  the  midpoint  of  the  pairs  of  point 
sources  within  a  properly  chosen  window.  Once  this  point  is 
identified,  one  can  conveniently  set  up  the  local  detector  to  de¬ 
termine  the  number  of  point  sources  in  the  neighborhood  of  the 
point  and  also  estimate  the  distance  between  point  sources  and 
their  intensities. 

A  basic  implementation  of  this  approach  was  carried  out  and 
the  result  of  applying  it  to  Eig.  25  is  shown  in  Eig.  28,  which 
closely  follows  the  structure  of  the  original  data.  To  have  a  better 

''^A  quantitative  proof  of  the  optimality  of  this  approach  is  given  in  [20],  [18]. 
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Fig.  29.  Residual:  difference  between  the  measured  image  and  convolution  of 
Fig.  28  and  PSF. 

indication  of  the  claimed  improvement,  see  the  image  in  Fig.  29 
which  is  the  difference  between  the  collected  image  (Fig.  26) 
and  the  convolution  of  the  output  image  (Fig.  28)  with  the  PSF. 

B.  An  Application  to  Physical  Fault  Detection  in  Integrate 
Circuit  Manufacturing 

Integrated  circuit  manufacturing  processes  require  transfer¬ 
ring  a  circuit  pattern  (including  logic  gates,  memory  cells,  etc.) 
onto  the  silicon  wafers.  This  process  is  referred  to  as  lithog¬ 
raphy.  However,  because  of  several  factors  (like  irregularity  of 
surface  or  mask  imperfection)  the  above  process  may  introduce 
some  errors  in  replicating  the  pattern.  These  errors  translate  into 
short  or  open  circuits  which  make  the  manufactured  IC  use¬ 
less.  Hence,  there  is  a  need  for  metrology  and  inspection  to  de¬ 
tect  these  physical  defects  or  faults.  The  commonly  employed 
method  is  to  image  the  printed  wafer  using  an  SEM  (Scanning 
Electron  Microscope)  and  to  inspect  this  image  by  matching 
with  the  desired,  expected  pattern.  Over  time,  the  dimensions  of 
the  printed  circuits  are  becoming  increasingly  small  (currently 
65  nm),  and  this,  coupled  with  the  resolution  limits  of  the  SEM, 
lead  to  a  heavily  blurred  and  noisy  version  of  the  desired  image 
of  the  wafer.  We  have  briefly  explained  how  to  employ  and  ex¬ 
tend  the  proposed  detection  frameworks  to  such  inspection  tasks 
in  [17].  However  the  developed  detector  evaluates  the  correct¬ 
ness  of  a  circuit  or  mask  in  a  simple  way  and  for  comprehensive 
and  practical  purposes  the  method  we  propose  needs  to  be  gen¬ 
eralized.  Such  generalization  requires  better  understanding  of 
the  physical  models  of  the  underlying  systems. 

VII.  Conclusion 

In  this  paper  we  investigated  the  problem  of  resolving  two 
closely  spaced  point  sources  (beyond  the  Rayleigh  limit)  from 
noise-corrupted,  blurred,  and  possibly  undersampled  images. 
We  have  studied  three  different  frameworks  to  derive  perfor¬ 
mance  limits  on  minimum  resolvability,  namely,  a  detection- 
theoretic  approach,  an  estimation-theoretic  analysis,  and  an  in- 
formation-theoretic  approach.  We  have  discussed  the  case  of 
under-Nyquist  sampling  in  several  parts  of  this  paper  because  of 


its  significance  in,  for  example,  image  super-resolution  recon¬ 
struction  [3],  [15],  [4].  To  our  knowledge,  this  paper  presents 
the  first  direct  analysis  of  resolution  limits  in  imaging  for  un¬ 
dersampled  images. 

Eor  the  detection-theoretic  approach,  we  have  put  forward  a 
hypothesis  testing  framework  and  derived  the  relationship  be¬ 
tween  the  required  SNR  and  the  detectable  distance  between 
point  sources.  We  explicitly  found  a  general,  fundamental,  and 
informative  relationship  to  quantify  a  (statistical)  measure  of 
resolution,  and  also  to  reveal  the  effect  of  point  spread  function 
and  sampling  parameters  on  resolvability.  The  established  ex¬ 
pressions  are  in  general  applicable  to  any  point  spread  function 
and  any  arbitrary  sampling  scheme.  The  analysis  for  the  under¬ 
sampled  images  demonstrates  explicitly  how  the  performance 
is  affected  by  the  sampling  phases  (sampling  offset),  while  in 
the  over-Nyquist  case  the  performance  is  independent  from  the 
sampling  offset. 

In  the  process  of  addressing  the  fundamental  relationship  be¬ 
tween  SNR  and  resolvability,  we  also  proposed  a  corresponding 
detector  which  can  be  implemented  in  practice  to  estimate  the 
locations  and  intensities  of  point  sources  from  a  degraded  (noisy 
and  blurred)  image.  As  an  example  of  implementing  such  a  tech¬ 
nique,  one  can  first  apply  a  running  windowed  energy  detector 
to  determine  whether  a  pattern  (generated  by  a  point  source) 
within  the  window  is  present  or  not.  Then  as  a  refinement  step, 
the  proposed  detector  will  be  applied  to  resolve  the  overlapping 
patterns  originated  by  two  closely-spaced  point  sources. 

We  studied  the  asymptotic  performance  of  the  ML  estimate 
of  the  unknown  parameters,  using  the  Eisher  information  ma¬ 
trix.  Deriving  a  lower  bound  on  the  variance  of  the  parameter 
d,  in  particular,  was  rather  helpful  in  confirming  the  results 
of  detection-theoretic  analysis  and  also  justifying  the  effect 
of  estimation  accuracy  on  the  performance  of  the  proposed 
detector.  We  also  derived  the  symmetric  Kullback-Liebler  dis¬ 
tance  for  the  underlying  problem  by  extending  its  standard  form 
to  higher  order  terms.  This  analysis  provides  an  upper  bound 
on  the  detection  performance  we  have  derived  in  Section  II 
and  also  connects  the  Eisher  information  matrix  with  this 
performance  bound. 

As  a  practical  matter,  we  presented  a  sensitivity/variational 
analysis  to  find  the  effect  of  known  or  unknown  variation  of 
PSE  on  the  performance  of  the  detector  discussed  in  Section  II. 
The  results  of  this  part  are  useful  to  compute  the  effect  of  a 
known  variation  of  PSE  on  performance  of  an  imaging  system 
and  to  quantify  the  degradation  of  such  a  system  due  to  imper¬ 
fect  knowledge  of  the  model  of  the  imaging  system. 

Eurthermore,  we  have  briefly  discussed  some  of  the  practical 
uses  of  the  proposed  methodology  to  other  fields  such  as  astro¬ 
nomical  data  analysis  tasks  and  circuit  inspection  applications. 

As  for  future  directions,  one  open  area  is  to  extend  this  frame¬ 
work  to  a  more  complicated  scenario  where  the  PSE  is  partially 
known  or  is  associated  with  some  stochastic  parameters.  An¬ 
other  direction  is  to  study  the  case  of  unknown  motion  between 
frames.  Also,  as  mentioned  before,  the  problem  of  resolving 
point  sources  is  a  canonical  case  study  in  imaging  and  image 
processing.  We  believe  more  general  imaging  problems  can  be 
treated  with  the  same  techniques  employed  in  this  paper,  with 
proper  extension  (and  complication)  of  the  imaging  models. 

Resolution  limit  in  photon-limited  imaging  systems  is  an¬ 
other  important  case  study  that  needs  to  be  investigated.  The 
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noise  in  these  systems  is  no  longer  additive  readout  noise  and 
follows  the  Poisson  process.  Also,  in  many  cases  the  underlying 
image  itself  may  contain  different  forms  of  noise  and  in  partic¬ 
ular  clutter  or  unwanted  interference  which  are  also  described 
hy  a  prohahilistic  function. 

Another  challenging  direction  is  to  investigate  the  perfor¬ 
mance  of  the  imaging  systems  with  time-varying  characteris¬ 
tics  (for  example,  time-varying  PSF  particularly  in  astronomical 
applications).  Such  behavior  may  be  modeled  by  some  proba¬ 
bilistic  descriptions  as  well.  The  analysis  here  requires  a  some¬ 
what  more  sophisticated  machinery. 

We  have  briefly  studied  the  resolution  problem  for  the  case 
where  the  measured  signal  is  contaminated  by  Poisson  noise  (for 
example  in  photon-limited  imaging)  in  [17].  This  is  an  important 
case  study  in  astronomical  applications  and  SEM  imaging. 


I 

E" 


Fig.  30.  E°,  _E“,  S'" ,  and  are  the  energy  of  the  discrete  signals  within 
depicted  areas. 


by  using 


Appendix  A 

Angular  Symmetric  PSF 

Let  us  consider  the  case  of  angular  symmetric  PSF  {h{x,  y)  = 
+  y‘^)  =  q{r))  such  as  that  corresponding  to  circular 
aperture.  Following  relationships  for  the  partial  derivatives  can 
be  written  in  the  polar  form  {r,6): 


dh(x,y)  dq(r) 

dx  dr 

dh(x,y)  dq(r) 


cos(d) 

sin(0) 


d‘^h{x,y)  _  d‘^q{r)  ,  dqjr)  sin^(6>) 

dx"^  dr‘^  dr  r 

d‘^h(x,y)  ^  d'^qjr)  .  dq{r)  cos^ (6) 

dy2  Qj.2  Qj,  j. 

oyox  or^  or  r 

After  doing  some  math  and  by  applying 

A27r  p2'ir 

/  sin(0)  cos(0)d0  =  /  sin(0)  cos^(0)d0  =  0 
Jo  Jo 

p2'ir  p2'ir 

/  sin^(6l)d6l  =  /  cos^(6l)d6l  =  vr 

Jo  Jo 

p2'ir  p2'ir  o 

/  sin4(0)d0  =  /  cos4(0)d0  =  — 

JO  JO  4 

f‘27V 

/  sin^(6l)  cos^(6l)d6l  =  — . 

./n  4 


We  can  show  that 


rg^(r)dr 


-Eio  —  £"01  —  TT 


£20  —  £02  —  — 


y  dr  \ 
3  dq{r) 
^  r  dr 


Ett  —  — 


d‘^q{r) 


1  dq(r) 
r  dr 


d‘^q{r) 


dq{r)  d‘^q(r] 


dr  =  0. 


Therefore 


SNR  = 


A(P/,  Pd)  64£o  -  16d^£io  +  dJE2 


-  If) 


Appendix  B 

Sampling  Theory  for  Under-Nyquist  Measurements 

To  begin,  let  us  first  mention  that  for  the  case  of  over-Nyquist 
sampling,  following  relationships  are  hold: 

Eij  = 

=  4^  [  [  u^v^\H(u,v)\‘^dudv  (98) 


J-^  I  djfdyi  J 

Whereas  in  the  under-Nyquist  case,  assuming  that  E[{u,v)  is 
symmetric  (real  h{x,y))  and  band  limited  to  —  £„  <  u  <  Bu 
and  —Bv  <  v  <  By,  we  will  have  (100)-(103)  shown  at  the 
bottom  of  the  next  page  where  5R  denote  the  real  part,  is  For 
instance,  if  £„  <  fs  <  2Bu  and  By  <  fg  <  2By  we  will 
similarly  have 

Ei  =  Ef  +  Ef  cos(27r<^)  +  Ef  cos(27r^) 

+Er  cos(27r(/)  +  270/^)  (105) 
where  Ef  denote  the  energy  of  signal  considering  there  is  no 
aliasing  effect  and  £|‘,£F  and  Ef^  are  the  terms  related  to 
aliasing  in  horizontal,  vertical  and  diagonal  directions  as  de¬ 
picted  in  Fig.  30. 

It  is  worth  mentioning  that  for  the  general  energy  terms 
(Eiji  ^  j)  we  will  have  sin(  • )  terms  as  well 


dr  (96) 


-l-£Ii.g^^27rV>  -1- £MDg^^27r<f)-|-^^27rV>  _  (106) 

l^We  have  used  the  following  identity  to  derive  the  above  results 


=E  |ad"  +  2  EE  Sf{c 
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Appendix  C 

Derivation  of  the  Fisher  Information  Matrix 

Let  and  O2  denote  the  sets  of  samples  of  the  first  and  the 
second  frames,  respectively.  From  (53),  the  Fisher  Information 
matrix  can  he  written  as 


A  =  Z^Z 


(107) 


^m  =  ^EE( 

h  1  \ 


ds{xk,yi) 

dd 


(117) 


(118) 


Noting  that!® 


aV(g,  d) 


dd^ 


=  0  i  =  l,3. 


(120) 


d=0 


where  Z  =  [zi,  Z2,  Z3, . . . ,  Zg]  is  defined  in  (108)-(1 15)  shown 
at  the  bottom  of  the  next  page. 

Appendix  D 

Computing  the  Kullback-Liebler  Distance  in  (57) 

Directly  using  the  results  in  [12,  p.  26],  we  can  obtain  the 
following  expression  for  KLD: 

J'(d)  «  d^I{0)  (116) 

where  1(d)  is  the  Fisher  Information  measure  [10,  p  40]. 


However,  for  the  hypothesis  test  of  interest  in  (4),  /(O)  is  zero 
and  (116)  is  not  directly  applicable.  Here  we  extend  the  ap¬ 
proach  in  [12,  p.  26]  by  considering  higher  order  terms.  Con¬ 
sider  the  following  Taylor  expansion: 

J(d)  =  ^[p(g,<i)  -p(g,0)]log  <ig 

=  J(0)  +  rfg 


We  will  have 

J7(0)  =  0 
dj 


^  (f  d^J 


6  d(f 


d=0 


^24  dd^ 


+  0{<f). 


d=0 


dd 


d^J 


d=0 


r  dp(g,  d)  / p(g,  d) 
dd  ®Ug,o) 

dp(g,d) 


+b(g,c?) 


dg  =  0 


d=0 


dd? 

d^J 


dp(g,d) 

dd 


1  2 


dd? 


d=0 


d=0 


Iv  pis,  d) 

\d=0 

!•  adp{g,d)  d^p(s,d) 

/  dd 


dg  =  0 


tv 

3 


pis,  d) 

3 


dp(g,d) 

dd 


[pis,  d)Y" 


dg  =  0, 


d^J 


dd^ 


d=0 

)  dp(g,d)  a^p(g,d) 


dd 


dd^ 


+  6 


d  P{g,d) 

dd? 


1  3 


d=0 


I'D 


18 


dp(g,d) 

dd 


1  2 


pis,  d) 

d'^p(g,d) 

dd? 


c? 

d=o  2  d(P' 

d=(S 

8 

dp(g,d) 

dd 

[pis,  0!)]' 


dg 


d=0 


(1 19)  *®Since  p(g,  d)  is  an  even  (and  differentiable)  function  around  <1  =  0. 


1 


-TT  J —TZ  LuLv 


Lu  1  Lv  1 


^  Y2  Hiu-  2'Ki,  V  -  2'Kj} 


,^/^[(u—2-Ki)4>+(v—2-Kj)tp] 


i=0  j=0 

Lu  1  Lv  1 


dfidu 


7-7r  LuLv 


\H(u  —  2'Kj,  V  —  2Kj)YdLudv 


•TT  ^TT  r)  1  1  Lu  1  L^  1 

/  E  E  dndr; 

X  ?t[H(u  —  2Ki,  V  —  2Kj)H*(u  —  2Ki' ,  v  —  2k j') 

^  gC^[(u-27ri)<()-|-(n— 27ri)-i/i— (u— 27ri')<^-(n-27ri')i/>]j 


1 

dvr^ 


nTvLv  P'lzLxt 


?z  L/'ij  d  '7Z  L/ni 


\H(u,  r;)|^d'udr;  -|- 


/— TT  J—7V  LuLy 


Lu  1 L^  1  Lu  1  L'u  1 

E  E  E  E 


i=0  i=0  i'=i-\-1  j'=j-\-1 

X  ^H(u  -  2Ki,  V  -  2Kj)H*(u  -  2Ki',  V  -  27r/)e^[2rr(i-i')<^+27r(i-i')V>]] 

/-l-oo  ^+00 

/  h‘^(x,y)dxdy 


+ 


—  00  J  —00 

Zto"  TiOii  cos(2vr(z  -  i')<j  +  2vr(i  -  f)i^) 

JY_^H(u  —  2Ki,  V  —  2Kj)H(u  —  2Ki' ,  v  —  2Kj')dudv 


(100) 


(101) 


(102) 


(103) 
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6 


d  P(s,d) 

dcP 


Jt>  p{s,  d) 

As  a  result,  we  can  write  (118)  as 


dg. 


d=0 


d‘^p(g,d) 

dd? 


-\  2 


Kg:  d) 


dg. 


(121) 


d=0 


On  the  other  hand,  we  observe  that 


dip 


f  d^  lnp(g,  d) 


d=0 


dd? 


I'D 


dd? 


d  lnp(g 
dd~ 


dP  Jv  PiE,  d) 

„  n9p(g,d)  d^p(g,d) 


pis,  d) 


dg 


dg 


d=0 


d=0 


dd 


dd? 


+  2 


9  P{g,d) 
dd? 


1  3 


ID 


pis,  d) 


K 

9p(g,d) 

^  d'^p(g,d) 

0 

dd 

dd? 

b(g,  d)Y 

d'^p(g,dp  ^ 

dd? 


Id 


pis,  d) 


+ 


dg. 


dp{g,d) 

dd 


b(g:  dW 


dg 


d=0 


(122) 


d=0 


Therefore,  we  show  (123)  and  (124)  at  the  bottom  of  the  next 
page.  As  we  see  from  (123),  the  divergence  for  the  underlying 


hypothesis  testing  problem  is  directly  related  to  the  second 
derivative  of  the  Fisher  information  matrix  evaluated  at  d  =  0. 


Appendix  E 

Derivation  of  AQ  and  AP  in  (70) 
Computing  AQ  and  AP  requires  the  following: 

f.+oo  /.+00 


AEoih)  =  fsj^ 
AEioih)  =  /. 

=  L 


— oo  «/  — oo 
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where  we  assume  that 
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as  a;  or  y  — *•  ±oo.  Therefore,  we  show  (125)-(126)  at  the  bottom 
of  the  page,  where 
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